24 research outputs found

    Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images

    Full text link
    Identification of regions affected by floods is a crucial piece of information required for better planning and management of post-disaster relief and rescue efforts. Traditionally, remote sensing images are analysed to identify the extent of damage caused by flooding. The data acquired from sensors onboard earth observation satellites are analyzed to detect the flooded regions, which can be affected by low spatial and temporal resolution. However, in recent years, the images acquired from Unmanned Aerial Vehicles (UAVs) have also been utilized to assess post-disaster damage. Indeed, a UAV based platform can be rapidly deployed with a customized flight plan and minimum dependence on the ground infrastructure. This work proposes two approaches for identifying flooded regions in UAV aerial images. The first approach utilizes texture-based unsupervised segmentation to detect flooded areas, while the second uses an artificial neural network on the texture features to classify images as flooded and non-flooded. Unlike the existing works where the models are trained and tested on images of the same geographical regions, this work studies the performance of the proposed model in identifying flooded regions across geographical regions. An F1-score of 0.89 is obtained using the proposed segmentation-based approach which is higher than existing classifiers. The robustness of the proposed approach demonstrates that it can be utilized to identify flooded regions of any region with minimum or no user intervention

    An automated essay evaluation system using natural language processing and sentiment analysi

    Get PDF
    An automated essay evaluation system is a machine-based approach leveraging long short-term memory (LSTM) model to award grades to essays written in English language. natural language processing (NLP) is used to extract feature representations from the essays. The LSTM network learns from the extracted features and generates parameters for testing and validation. The main objectives of the research include proposing and training an LSTM model using a dataset of manually graded essays with scores. Sentiment analysis is performed to determine the sentiment of the essay as either positive, negative or neutral. The twitter sample dataset is used to build sentiment classifier that analyzes the sentiment based on the student’s approach towards a topic. Additionally, each essay is subjected to detection of syntactical errors as well as plagiarism check to detect the novelty of the essay. The overall grade is calculated based on the quality of the essay, the number of syntactic errors, the percentage of plagiarism found and sentiment of the essay. The corrected essay is provided as feedback to the students. This essay grading model has gained an average quadratic weighted kappa (QWK) score of 0.911 with 99.4% accuracy for the sentiment analysis classifier

    PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature

    Full text link
    Scientific information extraction (SciIE), which aims to automatically extract information from scientific literature, is becoming more important than ever. However, there are no existing SciIE datasets for polymer materials, which is an important class of materials used ubiquitously in our daily lives. To bridge this gap, we introduce POLYIE, a new SciIE dataset for polymer materials. POLYIE is curated from 146 full-length polymer scholarly articles, which are annotated with different named entities (i.e., materials, properties, values, conditions) as well as their N-ary relations by domain experts. POLYIE presents several unique challenges due to diverse lexical formats of entities, ambiguity between entities, and variable-length relations. We evaluate state-of-the-art named entity extraction and relation extraction models on POLYIE, analyze their strengths and weaknesses, and highlight some difficult cases for these models. To the best of our knowledge, POLYIE is the first SciIE benchmark for polymer materials, and we hope it will lead to more research efforts from the community on this challenging task. Our code and data are available on: https://github.com/jerry3027/PolyIE.Comment: Work in progres

    A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing

    Full text link
    The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from published literature. We used natural language processing (NLP) methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science abstracts, which outperforms other baseline models in three out of five named entity recognition datasets when used as the encoder for text. Using this pipeline, we obtained ~300,000 material property records from ~130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells to recover non-trivial insights. The data extracted through our pipeline is made available through a web platform at https://polymerscholar.org which can be used to locate material property data recorded in abstracts conveniently. This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with a complete set of extracted material property information

    Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

    Full text link
    As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts. One approach to achieve fairness in speech recognition is to (1) identify speaker cohorts that suffer from subpar performance and (2) apply fairness mitigation measures targeting the cohorts discovered. In this paper, we report on initial findings with both discovery and mitigation of performance disparities using data from a product-scale AI assistant speech recognition system. We compare cohort discovery based on geographic and demographic information to a more scalable method that groups speakers without human labels, using speaker embedding technology. For fairness mitigation, we find that oversampling of underrepresented cohorts, as well as modeling speaker cohort membership by additional input variables, reduces the gap between top- and bottom-performing cohorts, without deteriorating overall recognition accuracy.Comment: Proc. Interspeech 202

    Draft genome sequence of Sclerospora graminicola, the pearl millet downy mildew pathogen:Genome sequence of pearl millet downy mildew pathogen

    Get PDF
    Sclerospora graminicola pathogen is one of the most important biotic production constraints of pearl millet worldwide. We report a de novo whole genome assembly and analysis of pathotype 1. The draft genome assembly contained 299,901,251 bp with 65,404 genes. Pearl millet [Pennisetum glaucum (L.) R. Br.], is an important crop of the semi-arid and arid regions of the world. It is capable of growing in harsh and marginal environments with highest degree of tolerance to drought and heat among cereals (1). Downy mildew is the most devastating disease of pearl millet caused by Sclerospora graminicola (sacc. Schroet), particularly on genetically uniform hybrids. Estimated annual grain yield loss due to downy mildew is approximately 10?80 % (2-7). Pathotype 1 has been reported to be the highly virulent pathotype of Sclerospora graminicola in India (8). We report a de novo whole genome assembly and analysis of Sclerospora graminicola pathotype 1 from India. A susceptible pearl millet genotype Tift 23D2B1P1-P5 was used for obtaining single-zoospore isolates from the original oosporic sample. The library for whole genome sequencing was prepared according to the instructions by NEB ultra DNA library kit for Illumina (New England Biolabs, USA). The libraries were normalised, pooled and sequenced on Illumina HiSeq 2500 (Illumina Inc., San Diego, CA, USA) platform at 2 x100 bp length. Mate pair (MP) libraries were prepared using the Nextera mate pair library preparation kit (Illumina Inc., USA). 1 ?g of Genomic DNA was subject to tagmentation and was followed by strand displacement. Size selection tagmented/strand displaced DNA was carried out using AmpureXP beads. The libraries were validated using an Agilent Bioanalyser using DNA HS chip. The libraries were normalised, pooled and sequenced on Illumina MiSeq (Illumina Inc., USA) platform at 2 x300 bp length. The whole genome sequencing was performed by sequencing of 7.38 Gb with 73,889,924 paired end reads from paired end library, and 1.15 Gb with 3,851,788 reads from mate pair library generated from Illumina HiSeq2500 and Illumina MiSeq, respectively. The sequences were assembled using various assemblers like ABySS, MaSuRCA, Velvet, SOAPdenovo2, and ALLPATHS-LG. The assembly generated by MaSuRCA (9) algorithm was observed superior over other algorithms and hence used for scaffolding using SSPACE. Assembled draft genome sequence of S. graminicola pathotype 1 was 299,901,251 bp long, with a 47.2 % GC content consisting of 26,786 scaffolds with N50 of 17,909 bp with longest scaffold size of 238,843 bp. The overall coverage was 40X. The draft genome sequence was used for gene prediction using AUGUSTUS. The completeness of the assembly was investigated using CEGMA and revealed 92.74% proteins completely present and 95.56% proteins partially present, while BUSCO fungal dataset indicated 64.9% complete, 12.4% fragmented, 22.7% missing out of 290 BUSCO groups. A total of 52,285 predicted genes were annotated using BLASTX and 38,120 genes were observed with significant BLASTX match. Repetitive element analysis in the assembly revealed 8,196 simple repeats, 1,058 low complexity repeats and 5,562 dinucleotide to hexanucleotide microsatellite repeats.publishersversionPeer reviewe

    Comparison of Small Gut and Whole Gut Microbiota of First-Degree Relatives With Adult Celiac Disease Patients and Controls

    Get PDF
    Recent studies on celiac disease (CeD) have reported alterations in the gut microbiome. Whether this alteration in the microbial community is the cause or effect of the disease is not well understood, especially in adult onset of disease. The first-degree relatives (FDRs) of CeD patients may provide an opportunity to study gut microbiome in pre-disease state as FDRs are genetically susceptible to CeD. By using 16S rRNA gene sequencing, we observed that ecosystem level diversity measures were not significantly different between the disease condition (CeD), pre-disease (FDR) and control subjects. However, differences were observed at the level of amplicon sequence variant (ASV), suggesting alterations in specific ASVs between pre-disease and diseased condition. Duodenal biopsies showed higher differences in ASVs compared to fecal samples indicating larger disruption of the microbiota at the disease site. The duodenal microbiota of FDR was characterized by significant abundance of ASVs belonging to Parvimonas, Granulicatella, Gemella, Bifidobacterium, Anaerostipes, and Actinomyces genera. The duodenal microbiota of CeD was characterized by higher abundance of ASVs from genera Megasphaera and Helicobacter compared to the FDR microbiota. The CeD and FDR fecal microbiota had reduced abundance of ASVs classified as Akkermansia and Dorea when compared to control group microbiota. In addition, predicted functional metagenome showed reduced ability of gluten degradation by CeD fecal microbiota in comparison to FDRs and controls. The findings of the present study demonstrate differences in ASVs and predicts reduced ability of CeD fecal microbiota to degrade gluten compared to the FDR fecal microbiota. Further research is required to investigate the strain level and active functional profiles of FDR and CeD microbiota to better understand the role of gut microbiome in pathophysiology of CeD

    Managing a front-line field hospital in Libya: Description of case mix and lessons learned for future humanitarian emergencies

    Get PDF
    Between June and August 2011, International Medical Corps deployed a field hospital near the front-line of the fighting between government troops and opposition fighters in Western Libya. The field hospital cared for over 1300 combatants and non-combatants from both sides of the conflict during that time period, the vast majority of them presenting with war-related injuries. Over 60% of battle-related injuries were due to shrapnel wounds and blast injuries from exploding small mortars, with smaller percentages due to battle-related motor vehicle accidents, gun shot wounds, burns, and other causes. The most pertinent lessons learned from our experience were the importance of dedicating significant resources to logistics and supply chain management, the rewards garnered from building strong ties with the local community early in the deployment of the field hospital, and the need to pay careful attention to basic principles of humanitarian ethics

    A review of histopathological and immunohistochemical parameters in diagnosis of metastatic renal cell carcinoma with a case of gingival metastasis

    No full text
    The oral cavity constitutes a site of low prevalence for metastasis of malignant tumors. However, oral metastasis of a renal origin is relatively more common and represents 2% of all cancer deaths. Renal cancer may metastasize to any part of the body, with a 15% risk of metastasis to the head and neck regions, and pose one of the greatest diagnostic challenges in medical sciences. Approximately 25% of patients have a metastatic disease at initial assessment, which is often responsible for initiating the diagnosis in the first place. Here we present a review of literature of renal cell carcinoma along with a case of gingival metastasis
    corecore